First, I wanted to investigate which variables are time dependent and also exclude some that were clearly unnecessary (i.e., “SITE”,“COLPROT”,“ORIGPROT”, “FLDSTRENG”,“FSVERSION”,“IMAGEUID”, “Month_bl”,“Month”,“M”,“update_stamp”).
Merge time dependent and independent variables into the long_dat data frame. Also, I recoded the time points in the VISCODE variable into integers.
long_dat <- dat[, c(ivars[,1], nivars[,1])] %>%
mutate(VISCODE = match(VISCODE, c("bl", "m03", "m06", "m12", "m18", "m24",
"m30","m36", "m42", "m48", "m54", "m60",
"m66", "m72","m78", "m84", "m90", "m96",
"m102", "m108","m114", "m120", "m126",
"m132", "m144", "m156"))-1) %>%
relocate(RID, PTID, VISCODE) %>%
arrange(RID, VISCODE)
In the original data frame there were quite some _bl or _BL variables. Thus, I wanted to check whether these columns had already been integrated or not at each corresponding time point for each participant. Surprise, the test was negative.
Therefore, I continued with merging the _bl/_BL variables with the corresponding time dependent variable for each participant. Additionally, I specified the data type of each variable individually for optimal control and oversight over the data structure.
Transform Long to Wide Data Format
## # A tibble: 6 × 1,153
## RID PTID AGE PTGENDER PTEDUCAT PTETHCAT PTRACCAT PTMARRY APOE4 FDG_0
## <fct> <chr> <dbl> <fct> <int> <fct> <fct> <fct> <int> <dbl>
## 1 2 011_S_0002 74.3 Male 16 Not His… White Married 0 1.37
## 2 3 011_S_0003 81.3 Male 18 Not His… White Married 1 1.08
## 3 4 022_S_0004 67.5 Male 10 Hisp/La… White Married 0 NA
## 4 5 011_S_0005 73.7 Male 16 Not His… White Married 0 1.29
## 5 6 100_S_0006 80.4 Female 13 Not His… White Married 0 NA
## 6 7 022_S_0007 75.4 Male 10 Hisp/La… More th… Married 1 NA
## # ℹ 1,143 more variables: FDG_2 <dbl>, FDG_7 <dbl>, FDG_11 <dbl>, FDG_12 <dbl>,
## # FDG_13 <dbl>, FDG_14 <dbl>, FDG_15 <dbl>, FDG_16 <dbl>, FDG_17 <dbl>,
## # FDG_18 <dbl>, FDG_19 <dbl>, FDG_21 <dbl>, FDG_22 <dbl>, FDG_23 <dbl>,
## # FDG_24 <dbl>, FDG_3 <dbl>, FDG_4 <dbl>, FDG_5 <dbl>, FDG_6 <dbl>,
## # FDG_9 <dbl>, FDG_8 <dbl>, FDG_10 <dbl>, FDG_25 <dbl>, FDG_20 <dbl>,
## # FDG_1 <dbl>, PIB_0 <dbl>, PIB_2 <dbl>, PIB_7 <dbl>, PIB_11 <dbl>,
## # PIB_12 <dbl>, PIB_13 <dbl>, PIB_14 <dbl>, PIB_15 <dbl>, PIB_16 <dbl>, …
Based on the number of participants measured at any time point I made a frequency plot to get a first idea of the sampling frequency.
Based on these findings it appears that time point 9 is a cut-off where the number of measurements drop quite strongly. Time point 9 corresponds to month 42 (i.e., 3.5 years) of the follow-up.
The merge(by.x, by.y) function creates a new data frame that only keeps those rows for which there is a matching key (in our case PTID). Therefore, we do have genetic data from 2 additional individuals for which we do not have any other measurements. The final data frame for which testing data and genetic data is available is thus, 1408 (N).
Based on this plot, we can see a positive relationship between the polygenic score for education attainment and actual years of education. This means that with a higher PGS score comes higher genetic capacity for educational attainment.
We ran Pearson’s correlation which resulted in r = 0.286 (p-value < 2.2e-16)
# Linear regression model
model <- lm(PTEDUCAT~EA22+AGE+PTGENDER,data=long_dat)
# Check model assumptions
check_model(model)
To get the residual we regressed the polygenic risk score for educational attainment against actual EA including the variables SEX & AGE as covariates. The results are depicted in the density plot.
It is important to correctly interpret the residual scores. The correct way to interpret them is, that a high residual score means that the individual has over-performed relative to his or her genetic capacity. See for example in this table for a short proof:
## Actual Predicted Residuals
## 1 18 16.91911 1.0808864
## 2 16 15.16815 0.8318481
## 3 12 16.64336 -4.6433625
## 4 20 16.02560 3.9743989
## 5 14 14.83958 -0.8395765
## 6 13 15.37284 -2.3728412
Using the ntile function from dplyr, the lower tertile will be assigned value 1 (~ negative residual), middle tertile value 2 and upper tertile value 3 (~positive residual). The time-point is limited to the 9th follow-up (i.e., 48 months)
“The mini–mental state examination (MMSE) is a 30-point questionnaire that is used extensively in clinical and research settings to measure cognitive impairment. It is commonly used in medicine and allied health to screen for dementia. It is also used to estimate the severity and progression of cognitive impairment and to follow the course of cognitive changes in an individual over time; thus making it an effective way to document an individual’s response to treatment.Administration of the test takes between 5 and 10 minutes and examines functions including registration (repeating named prompts), attention and calculation, recall, language, ability to follow simple commands and orientation. […] Any score of 24 or more (out of 30) indicates a normal cognition. Below this, scores can indicate severe (≤9 points), moderate (10–18 points) or mild (19-23 points) cognitive impairment.” (Wikipedia.org). The MMSE scores were normalized using the NormPsy package and then the cut-off was calculated.
To see if it is necessary to stratify for age groups effect of polygenic risk score for EA and age group was tested using linear regression. The results are displayed below.
##
## Call:
## lm(formula = MMSE ~ EA22 + Age_Group, data = long_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.415 -1.189 1.109 2.458 3.473
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.12540 0.16657 162.844 < 2e-16 ***
## EA22 0.45636 0.11384 4.009 6.19e-05 ***
## Age_Group -0.01018 0.06723 -0.151 0.88
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.642 on 4822 degrees of freedom
## (6151 observations deleted due to missingness)
## Multiple R-squared: 0.003326, Adjusted R-squared: 0.002912
## F-statistic: 8.045 on 2 and 4822 DF, p-value: 0.0003249
##
## Call:
## lm(formula = MMSE_norm ~ EA22 + Age_Group, data = long_dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -77.457 -14.481 1.461 21.439 29.791
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 75.4231 0.9389 80.330 < 2e-16 ***
## EA22 3.7284 0.6416 5.811 6.62e-09 ***
## Age_Group -0.1925 0.3790 -0.508 0.611
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.53 on 4822 degrees of freedom
## (6151 observations deleted due to missingness)
## Multiple R-squared: 0.006955, Adjusted R-squared: 0.006543
## F-statistic: 16.89 on 2 and 4822 DF, p-value: 4.923e-08
Next the survival analysis was conducted for the genetic capacity of educational attainment and the residual educational attainment.
The difference between the high and low capacity/residual Educational attainment are 3.354398^{-5} and 4.5572169^{-23} respectively. In the next step the surivival analysis was conducted stratified for genetic capacity for educational attainment.
Log Test Low: 7.2954207^{-10}
Log Test Middle:
1.5611391^{-16}
Log Test High:3.2760959^{-5}
The Cognitive Subscale Alzheimer’s Disease Assessment Scale (ADAS) is made of 11 tasks that include both subject-completed tests and observer-based assessments, assessing the memory, language, and praxis domains. The result is a global final score ranging from 0 to 70, based on the sum of the scores of the single tasks (ADAS11).
Beyond the ADAS11 score, the ADNI study included also an additional test of delayed word recall and a number cancellation or maze task, which are further summed to have a new total score that ranges from 0 to 85 (ADAS13).
In addition, the score of the task 4 (Word Recognition, ADASQ4) was included in the ADNIMERGE dataset.
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), ADAS11_cut) ~ thirtile_res,
## data = .)
##
## n=5540, 5 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 2799 1198 1008 36.0 81.7
## thirtile_res=3 2741 800 990 36.6 81.7
##
## Chisq= 81.7 on 1 degrees of freedom, p= <2e-16
“The ADAS13 was included as a global measure of cognitive function. ADAS13 is a test battery developed to assess severity of cognitive impairment associated with AD and includes subtests and clinical evaluations assessing memory function, reasoning, language function, orientation and praxis. The ADAS13 is a modified version of the original ADAS-Cog-11, adding a cancellation task and a delayed free recall task. The higher the scores, the more severe impairment of cognitive function.” (Mofrad et al., 2021)
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), ADAS13_cut) ~ thirtile_res,
## data = .)
##
## n=7290, 27 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3649 1619 1354 51.8 111
## thirtile_res=3 3641 1095 1360 51.6 111
##
## Chisq= 111 on 1 degrees of freedom, p= <2e-16
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), ADASQ4_cut) ~ thirtile_res,
## data = .)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3659 1492 1273 37.8 80.2
## thirtile_res=3 3658 1067 1286 37.4 80.2
##
## Chisq= 80.2 on 1 degrees of freedom, p= <2e-16
“The clinical dementia rating (CDR) scale is commonly used to diagnose dementia due to Alzheimer’s disease (AD). The sum of boxes of the CDR (CDR-SB) has recently been emphasized and applied to interventional trials for tracing the progression of cognitive impairment (CI) in the early stages of AD.” (Tzeng et al., 2022)
See Table 3 for explanation on the staging category (O’Bryant et al., 2012)
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), CDRSB_cut) ~ thirtile_res,
## data = .)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3659 2779 2566 17.6 39.8
## thirtile_res=3 3658 2384 2597 17.4 39.8
##
## Chisq= 39.8 on 1 degrees of freedom, p= 3e-10
“The DSST (Digit Symbol Substitution Test) is a paper-and-pencil cognitive test presented on a single sheet of paper that requires a subject to match symbols to numbers according to a key located on the top of the page. The subject copies the symbol into spaces below a row of numbers. The number of correct symbols within the allowed time, usually 90 to 120 seconds, constitutes the score.” (Jaeger, 2018) The lower the scores, the more severe impairment of cognitive function.
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), DIGITSCOR_cut) ~
## thirtile_res, data = .)
##
## n=4029, 3288 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 2146 455 367 20.9 45
## thirtile_res=3 1883 251 339 22.7 45
##
## Chisq= 45 on 1 degrees of freedom, p= 2e-11
The Functional Activities Questionnaire is used to assess an individual’s functional abilities in daily living activities. It is a caregiver-based questionnaire that helps evaluate how well a person is able to perform various instrumental activities of daily living (IADLs) and basic activities of daily living (ADLs). (ChatGPT) Sum scores (range 0-30). The score range for each item is 0–3 (higher scores indicate greater impairment; 0 = normal or never did but could do now; 1 = has difficulty but does by self or never did but would have difficulty now; 2 = requires assistance; 3 = dependent). There is no established cut-off score for IADL impairment on the FAQ. However, one study reported that a total FAQ score (sum of all 10 item scores; range 0–30) of ≥ 6 is suggestive of functional impairment [ 20]. (Marshall et al., 2015)
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), FAQ_cut) ~ thirtile_res,
## data = .)
##
## n=7308, 9 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3659 1281 1090 33.5 70.9
## thirtile_res=3 3649 908 1099 33.2 70.9
##
## Chisq= 70.9 on 1 degrees of freedom, p= <2e-16
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), LDELTOTAL_cut) ~
## thirtile_res, data = .)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3659 2017 1587 117 251
## thirtile_res=3 3658 1174 1604 115 251
##
## Chisq= 251 on 1 degrees of freedom, p= <2e-16
Reference literature: doi: 10.1111/j.1532-5415.2005.53221.x
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), MOCA_cut) ~ thirtile_res,
## data = .)
##
## n=3896, 3421 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1819 1348 1277 3.97 8.91
## thirtile_res=3 2077 1310 1381 3.67 8.91
##
## Chisq= 8.9 on 1 degrees of freedom, p= 0.003
The RAVLT was included as a measure of memory function. In this test, the participants are asked to recall words from a list of 15 nouns immediately after each of five learning trials and after a short and a long delay. Two measures known to be sensitive to cognitive changes in patients with AD were included in the present study: Immediate recall (RAVLT-Im): the number of correct responses across the immediate recall of the five learning trials; percent forgetting (RAVLT-PF): the score on the fifth learning trial minus the score on the long delayed recall, divided by the score obtained on the fifth learning trial. The lower the scores, the more severe impairment of cognitive function.
Different summary scores are derived from raw RAVLT scores. These include RAVLT Immediate (the sum of scores from 5 first trials (Trials 1 to 5)), RAVLT Learning (the score of Trial 5 minus the score of Trial 1), RAVLT Forgetting (the score of Trial 5 minus score of the delayed recall) and RAVLT Percent Forgetting (RAVLT Forgetting divided by the score of Trial 5). We use naming of the ADNI merge table3 for these summary measures. We investigated the relationship between MRI measures and RAVLT cognitive test scores by estimating the RAVLT Immediate and RAVLT Percent Forgetting from the gray matter density. These two summary scores were selected since they highlight different aspects of episodic memory, learning (RAVLT Immediate) and delayed memory (RAVLT Percent forgetting), essential to AD and previous studies (Estévez-González et al., 2003, Wang et al., 2011, Gomar et al., 2014, Moradi et al., 2015) have indicated strong relationships between these two RAVLT measures and Alzheimer’s disease.
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_immediate_cut) ~
## thirtile_res, data = .)
##
## n=7299, 18 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3651 2612 2796 12.1 27.9
## thirtile_res=3 3648 3006 2822 12.0 27.9
##
## Chisq= 27.9 on 1 degrees of freedom, p= 1e-07
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_perc_forgetting_cut) ~
## thirtile_res, data = .)
##
## n=7290, 27 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3642 1216 1072 19.4 40.8
## thirtile_res=3 3648 940 1084 19.2 40.8
##
## Chisq= 40.8 on 1 degrees of freedom, p= 2e-10
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_forgetting_cut) ~
## thirtile_res, data = .)
##
## n=7299, 18 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3651 68 83.5 2.89 5.77
## thirtile_res=3 3648 100 84.5 2.85 5.77
##
## Chisq= 5.8 on 1 degrees of freedom, p= 0.02
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), RAVLT_learning_cut) ~
## thirtile_res, data = .)
##
## n=7299, 18 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3651 39 44.2 0.608 1.21
## thirtile_res=3 3648 50 44.8 0.599 1.21
##
## Chisq= 1.2 on 1 degrees of freedom, p= 0.3
The Trail Making Test is a neuropsychological test of visual attention and task switching. It has two parts, in which the subject is instructed to connect a set of 25 dots as quickly as possible while maintaining accuracy.
The test can provide information about visual search speed, scanning, speed of processing, mental flexibility, and executive functioning. It is sensitive to cognitive impairment associated with dementia, including Alzheimer’s disease. (ChatGPT)
Record the total number of seconds to complete Part B (Trails B), up to a maximum of 300 seconds. If the participant is not finished by 300 seconds, the score is 300.
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), TRABSCOR_cut) ~
## thirtile_res, data = .)
##
## n=7252, 65 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 3619 933 738 51.3 106
## thirtile_res=3 3633 554 749 50.6 106
##
## Chisq= 106 on 1 degrees of freedom, p= <2e-16
The original version of the ECog is an informant-based measure of cognitively-relevant everyday abilities comprised of 39 items, covering six cognitively-relevant domains: Everyday Memory, Everyday Language, Everyday Visuospatial Abilities, and Everyday Planning, Everyday Organization, and Everyday Divided Attention. Ratings are made on a four-point scale: 1 = better or no change compared to 10 years earlier, 2 = questionable/occasionally worse, 3 = consistently a little worse, 4 = consistently much worse. (Tomaszewski Farias et al., 2012)
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtDivatt_cut) ~
## thirtile_res, data = .)
##
## n=3888, 3429 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1820 194 222 3.44 6.7
## thirtile_res=3 2068 272 244 3.11 6.7
##
## Chisq= 6.7 on 1 degrees of freedom, p= 0.01
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtLang_cut) ~
## thirtile_res, data = .)
##
## n=3919, 3398 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1830 309 273 4.63 9.11
## thirtile_res=3 2089 266 302 4.20 9.11
##
## Chisq= 9.1 on 1 degrees of freedom, p= 0.003
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtMem_cut) ~
## thirtile_res, data = .)
##
## n=3925, 3392 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1828 341 349 0.188 0.37
## thirtile_res=3 2097 396 388 0.169 0.37
##
## Chisq= 0.4 on 1 degrees of freedom, p= 0.5
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtOrgan_cut) ~
## thirtile_res, data = .)
##
## n=3855, 3462 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1787 299 319 1.26 2.48
## thirtile_res=3 2068 377 357 1.13 2.48
##
## Chisq= 2.5 on 1 degrees of freedom, p= 0.1
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtPlan_cut) ~
## thirtile_res, data = .)
##
## n=3915, 3402 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1828 442 415 1.74 3.48
## thirtile_res=3 2087 431 458 1.58 3.48
##
## Chisq= 3.5 on 1 degrees of freedom, p= 0.06
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtVisspat_cut) ~
## thirtile_res, data = .)
##
## n=3897, 3420 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1827 351 335 0.719 1.43
## thirtile_res=3 2070 350 366 0.660 1.43
##
## Chisq= 1.4 on 1 degrees of freedom, p= 0.2
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogPtTotal_cut) ~
## thirtile_res, data = .)
##
## n=3919, 3398 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1830 366 356 0.258 0.512
## thirtile_res=3 2089 384 394 0.234 0.512
##
## Chisq= 0.5 on 1 degrees of freedom, p= 0.5
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPDivatt_cut) ~
## thirtile_res, data = .)
##
## n=3913, 3404 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1843 632 574 5.76 12
## thirtile_res=3 2070 558 616 5.37 12
##
## Chisq= 12 on 1 degrees of freedom, p= 5e-04
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPLang_cut) ~
## thirtile_res, data = .)
##
## n=3989, 3328 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1872 771 702 6.71 14.1
## thirtile_res=3 2117 683 752 6.27 14.1
##
## Chisq= 14.1 on 1 degrees of freedom, p= 2e-04
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPMem_cut) ~
## thirtile_res, data = .)
##
## n=3989, 3328 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1872 761 707 4.18 8.75
## thirtile_res=3 2117 702 756 3.90 8.75
##
## Chisq= 8.8 on 1 degrees of freedom, p= 0.003
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPOrgan_cut) ~
## thirtile_res, data = .)
##
## n=3850, 3467 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1791 532 505 1.42 2.92
## thirtile_res=3 2059 523 550 1.31 2.92
##
## Chisq= 2.9 on 1 degrees of freedom, p= 0.09
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPPlan_cut) ~
## thirtile_res, data = .)
##
## n=3959, 3358 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1855 701 621 10.23 21.4
## thirtile_res=3 2104 587 667 9.53 21.4
##
## Chisq= 21.4 on 1 degrees of freedom, p= 4e-06
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPVisspat_cut) ~
## thirtile_res, data = .)
##
## n=3953, 3364 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1846 688 616 8.39 17.4
## thirtile_res=3 2107 602 674 7.67 17.4
##
## Chisq= 17.4 on 1 degrees of freedom, p= 3e-05
## Call:
## survdiff(formula = Surv(as.integer(VISCODE), EcogSPTotal_cut) ~
## thirtile_res, data = .)
##
## n=3981, 3336 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## thirtile_res=1 1871 800 725 7.84 16.5
## thirtile_res=3 2110 698 773 7.35 16.5
##
## Chisq= 16.5 on 1 degrees of freedom, p= 5e-05
test <- long_dat %>% filter(AGE %in% 60:70)
# also test # people 65 to 75